Extension of CART using multiple splits under order restrictions

نویسندگان

  • R. Strobl
  • G. Salanti
  • K. Ulm
چکیده

CART was introduced by Breiman et al. (1984) as a classification tool. It divides the whole sample recursively in two subpopulations by finding the best possible split with respect to a optimisation criterion. This method, restricted up to date to binary splits, is extended in this paper for allowing also multiple splits. The main problem with this extension is related to the optimal number of splits and the location of the corresponding cutpoints. In order to reduce the computational effort and enhance parsimony, the reduced isotonic regression was used in order to solve this problem. The extended CART method was tested in a simulation study and was compared with the classical approach in an epidemiological study. In both studies the extended CART turned out to be a useful and reliable alternative.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing ANN and CART to Model Multiple Land Use Changes: A Case Study of Sari and Ghaem-Shahr Cities in Iran

Most of the land use change modelers have used to model binary land use change rather than multiple land use changes. As a first objective of this study, we compared two well-known LUC models, called classification and regression tree (CART) and artificial neural network (ANN) from two groups of data mining tools, global parametric and local non-parametric models, to model multiple LUCs. The ca...

متن کامل

Experimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering

One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...

متن کامل

Discussion of the paper Bayesian Treed Generalized Linear Models

In this stimulating paper, the authors have successfully exploited Markov chain Monte Carlo methods to explore the space of graphs for CART-like trees in which the terminal nodes represent generalized linear models (GLMs). Integration over the parameters of the terminal GLMs, in order to compute the marginal likelihood (probability of data given the model) for the MCMC search, is accomplished u...

متن کامل

Evaluation of wheat genotypes under tillage practices: application of technique for order preference by similarity to ideal solution method

Adoption of conservative agriculture at farm level is associated with reducing the production costs and leads to crop yield stability. The aim of this study was to prioritize experimental treatments based on different criteria by applying "technique for order preference by similarity to ideal solution" (TOPSIS).A filed experiment was carried out at Zarghan research station, Fars province, Iran,...

متن کامل

Using Pairs of Data-Points to Define Splits for Decision Trees

Conventional binary classification trees such as CART either split the data using axis-aligned hyperplanes or they perform a computationally expensive search in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007